An Efficient Syntactic Tagging Tool For Corpora
نویسندگان
چکیده
A BSTRA CT The tree bank is an important resources tbr MT and linguistics researches, but it requires that large number of sentences be annotated with syntactic information. It is time consuming and troublesome, and dil'ficult to keep consistency, if' annotation is done manually. In this paper, wc presented a new technique for the semi-automatic tagging of Chinese tcxt. The system takes as input Chinese text, and outputs the syntactically tagged sentence(dependency tree). We use dependency grammar and employ a stack based sh i f t / r educe context-dependent parser as the tagging mechanism. The system works in human-machine cooperative way, in which the machine can acquire tagging rules from human intervention. The automation level can be improved step by step by accumulating rules during annotation. In addition, good consistency of tagging is guaranteed.
منابع مشابه
An Annotated Corpus Management Tool: ChaKi
Large scale annotated corpora are very important not only in linguistic research but also in practical natural language processing tasks since a number of practical tools such as Part-of-speech (POS) taggers and syntactic parsers are now corpus-based or machine learningbased systems which require some amount of accurately annotated corpora. This article presents an annotated corpus management t...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملO. Scrivner, T. Gilmanov SWIFT ALIGNER: A TOOL FOR THE VISUALIZATION AND CORRECTION OF WORD ALIGNMENT AND FOR CROSS LANGUAGE TRANSFER
It is well known that parallel corpora are valuable linguistic resources. One of the benefits of such corpora is that they allow for the building an annotated corpus for resource-poor languages via crosslanguage transfer. That is, given accurate alignment between a word from a source language and its equivalent in a target language, some linguistic information, such as part-of-speech tags or sy...
متن کاملKCAT: A Korean Corpus Annotating Tool Minimizing Human Intervention
While large POS(part-of-speech) annotated corpora play an important role in natural language processing, the annotated corpus requires very high accuracy and consistency. To build such an accurate and consistent corpus, we often use a manual tagging method. But the manual tagging is very labor intensive and expensive. Furthernaore, it is not easy to get consistent results from the humari expert...
متن کاملDetection of Strange and Wrong Automatic Part-of-Speech Tagging
Automatic morphosyntactic tagging of corpora is usually imperfect. Wrong or strange tagging may be automatically repeated following some patterns. It is usually hard to manually detect all these errors, as corpora may contain millions of tags. This paper presents an approach to detect sequences of part-of-speech tags that have an internal cohesiveness in corpora. Some sequences match to syntact...
متن کامل